Timestamp Synchronization for Event Traces of Large-Scale Message-Passing Applications
نویسندگان
چکیده
Identifying wait states in event traces of message-passing applications requires measuring temporal displacements between concurrent events. In the absence of synchronized hardware clocks, linear interpolation techniques can already account for differences in offset and drift, assuming that the drift of an individual processor is not time dependant. However, inaccuracies and drifts varying in time can still cause violations of the logical event ordering. The controlled logical clock algorithm accounts for such violations in point-to-point communication by shifting message events in time as much as needed while trying to preserve the length of intervals between local events. In this article, we describe how the controlled logical clock is extended to collective communication to enable a more complete correction of realistic message-passing traces. In addition, we present a parallel version of the algorithm that is intended to scale to thousands of application processes and outline its implementation within the framework of the scalasca toolkit.
منابع مشابه
Scalable timestamp synchronization for event traces of message-passing applications
Event traces are helpful in understanding the performance behavior of message-passing applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of...
متن کاملParallel Discrete Event Simulation on a Shared - Memory Multiprocessor
Many large-scale discrete event simulation computations for modeling telecommunication networks , computer systems, transportation grids, and a variety of other applications are excessively time consuming, and are a natural candidate for parallel execution. However, discrete event simulations are challenging to parallelize because cause-and-eeect relationships determine dependencies between sim...
متن کاملA scalable tool architecture for diagnosing wait states in massively parallel applications
When scaling message-passing applications to thousands of processors, their performance is often affected by wait states that occur when processes fail to reach synchronization points simultaneously. As a first step in reducing the performance impact, we have shown in our earlier work that wait states can be diagnosed by searching event traces for characteristic patterns. However, our initial s...
متن کاملTimestamp synchronization of concurrent events
Supercomputing is a key technological pillar of modern science and engineering, indispensable for solving critical problems of high complexity. However, to effectively utilize the enormously complex large-scale computer systems available today, scientists and engineers need powerful and robust software development tools. One technique widely used by such tools is event tracing with a broad spec...
متن کاملImplementing MPI Based Portable Parallel Discrete Event Simulation Support in the OMNeT++ Framework
In this paper, we introduce our Message Passing Interface (MPI) based Object-Oriented parallel discrete event simulation framework. The framework extends the capabilities of the OMNeT++ simulation system. In conjunction with this project, our research efforts also include the development of synchronization methods suitable for architectural properties of the distributed-memory and shared-memory...
متن کامل